DUDLEY KNOX LI ’.ARY 
NAVAL POSTC r Z’J ■ 
MONTEREY, CALIFO 






NAVAL POSTGRADUATE SCHOOL 

Monterey, California 




THESIS 



AN EMPIRICAL STUDY OF A REFORMULATION OF 
THE CUMULATIVE AVERAGE LEARNING CURVE 



by 

David George Jenkins 
March 1986 



Thesis Advisor: 



Dan C. Boger 



Approved for public release; distribution is unlimited 




63 52 



SECURITY CLASSIFICATION op this page 



REPORT DOCUMENTATION PAGE 



la REPORT SECURITY CLASSIFICATION 

UNCLASSIFIED 



lb. RESTRICTIVE MARKINGS 



2a SECURITY CLASSIFICATION AUTHORITY 



2b. DECLASSIFICATION /DOWNGRADING SCHEDULE 



3 DISTRIBUTION /AVAILABILITY OF REPORT 

Approved for public release; 
distribution is unlimited 



A PERFORMING ORGANIZATION REPORT NUMBER(S) 



5 MONITORING ORGANIZATION REPORT NUM8ER(S) 



6a NAME OF PERFORMING ORGANIZATION 

Naval Postgraduate School 



6b OFFICE SYMBOL 
(if applicable) 

Code 55 



7a NAME OF MONITORING ORGANIZATION 

Naval Postgraduate School 



6c ADDRESS (City, State , and ZlPCode) 

Monterey, California 93943-5000 



7b ADDRESS (City, State , and ZlPCode) 

Monterey, California 93943-5000 



8a NAME OF FUNDING / SPONSORING 
ORGANIZATION 



8b OFFICE SYMBOL 
(If applicable) 



9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 



8c ADDRESS (City, State, and ZlPCode) 



10 SOURCE OF FUNDING NUMBERS 



PROGRAM 


PROJECT 


TASK 


WORK UNIT 


ELEMENT NO 


NO 


NO 


ACCESSION NO 



11 TITLE (Include Security Claudication) 

AN EMPIRICAL STUDY OF A REFORMULATION OF THE CUMULATIVE AVERAGE 
LEARNING CURVE 



12 PERSONAL AUTHOR(S) 



Jenkins, David George 



1 3a type OF REPORT 

Master's Thesis 



13b TIME COVERED 
FROM TO 



14 DATE OF REPORT (Year Month, Day) 

1986 March 



IS PAGE COUNT 

138 



f 6 supplementary notation 



1 7 



COSATI CODES 



F ELD 



GROUP 



SUB-GROUP 



18 SUBJECT TERMS (Continue on reverse if necessary and identify by block number) 

Learning Curve, Linear Regression, 

Non-Linear Regression , Autocorrelation , 



'9 ABSTRACT (Continue on reverse if neceuary and identify by block number) 

One aspect of efficient management of resources that cannot be 
overstated is accurate cost estimation. The learning curve technique 
used in cost estimation continues to be a significant tool by itself 
and as an important factor in other cost estimation algorithms. This 
study conducts an empirical investigation of a theoretical reformulation 
of the cumulative average learning curve. The model is empirically 
corroborated by comparison of linear and nonlinear regression results 
with the classical unit and cumulative average learning curve 
specifications using two sets of aircraft production data. When 
autocorrelation was present and subsequently modeled into the data, 
the resulting linear models were significantly distorted whereas 
the non-linear models were not. While the model being scrutinized 
was adequate, the unit learning curve appeared to be the superior model. 



20 O'STRiBUTlON/ AVAILABILITY OF ABSTRACT 
C3tuNCLASS!F IE D/UNLIMITED □ SAME AS RPT 


□ DTIC USERS 


21 ABSTRACT SECURITY CLASSIFICATION 

UNCLASSIFIED 


22a NAME OF RESPONSIBLE INDIVIDUAL 

Dan C. Booer 


22b TELEPHONE (Include Area Code) 

(408) 646-2607 


22c OFFICE SYMBOL 

54 BO 



DD FORM 1473.8a mar 



83 APR edition may be used until exhausted 
All other editions are obsolete 



SECURITY CLASSIFICATION OF "HiS PAGE 



l 



Approved for public release; distribution is unlimited 



An Empirical Study of a Reformulation of 
the Cumulative Average Learning Curve 



by 



David George Jenkins 
Lieutenant, United States Navy 
B.S., United States Naval Academy, 1978 



Submitted in partial fulfullment of the 
requirements for the degree of 

MASTER OF SCIENCE IN OPERATIONS RESEARCH 

from the 



NAVAL POSTGRADUATE SCHOOL 
March 1986 



ABSTRACT 



One aspect of efficient management of resources that 
cannot be overstated is accurate cost estimation. The 
learning curve technique used in cost estimation continues 
to be a significant tool by itself and as an important 
factor in other cost estimation algorithms. This study 
conducts an empirical investigation of a theoretical 
reformulation of the cumulative average learning curve. The 
model is empirically corroborated by comparison of linear 
and nonlinear regression results with the classical unit and 
cumulative average learning curve speci f ications using two 
sets of aircraft production data. When autocorrelation was 
present and subsequently modeled into the data, the 
resulting linear models were significantly distorted whereas 
the non-linear models were not. While the model being 
scrutinized was adequate, the unit learning curve appeared 
to be the superior model. 



3 






. ■ Jr 



TABLE OF CONTENTS 

I. INTRODUCTION 9 

A. BACKGROUND 9 

B. OBJECTIVES 18 

1 1 . THE MODELS 19 

A. CUMULATIVE AVERAGE LEARNING CURVE 19 

B. UNIT LEARNING CURVE 20 

C. BOGER, JONES, AND SONTHEIMER MODEL 20 

III. DATA 2 7 

A. GENERAL 27 

B. REFINEMENT 29 

1. Cumulative Average Learning Curve 30 

2. Unit Learning Curve 32 

3. Boger , Jones, and Sontheimer Model 34 

IV. METHODOLOGY 40 

A. LINEAR REGRESSION 40 

B. NON-LINEAR REGRESSION 41 

C. DATA ANALYSIS 42 

1. Autocorrelation 43 

2. Outliers 48 

3. Normality of Error Terms 48 

4. Homoskedasticity 49 

D. INFERENCES CONCERNING PARAMETER ESTIMATION ... 49 



4 



E. VALIDATION 51 

F. COMPARISONS 54 

1. Regression Models ' 54 

2. Learning Curve Specifications 54 

V. RESULTS 56 

A. DATA ANALYSIS 56 

1. Boger, Jones, and Sontheimer Model: 

C-141 Data Analysis 56 

B. VALIDATION 71 

C. STRUCTURAL ANALYSIS 75 

D. COMPARISON OF FITTED MODELS 78 

E. COMPARISON OF FITTED LOT COSTS 81 

VI. CONCLUSIONS 86 

APPENDIX A: ADJUSTED CUMULATIVE AVERAGE LEARNING 

CURVE 89 

APPENDIX B: ADJUSTED UNIT LEARNING CURVE DATA 93 

APPENDIX C: ADJUSTED BOGER ET AL LEARNING CURVE 

DATA 97 

APPENDIX D: BOGER ET AL MODEL: C-141 DATA 

ANALYSIS RESULTS 98 

APPENDIX E: UNIT LEARNING CURVE: C-141 DATA 

ANALYSIS RESULTS 100 

APPENDIX F: CUMULATIVE AVERAGE LEARNING CURVE: 

C-141 DATA ANALYSIS RESULTS 102 

APPENDIX G: BOGER ET AL MODEL: F-102 DATA 

ANALYSIS RESULTS 104 

APPENDIX H: UNIT LEARNING CURVE: F-102 DATA 

ANALYSIS RESULTS 106 

APPENDIX I: CUMULATIVE AVERAGE LEARNING CURyE : 

F-102 DATA ANALYSIS RESULTS 108 



5 



APPENDIX J: FITTED MODEL PLOTS 110 

APPENDIX K: FITTED LOT COST PLOTS 122 

LIST OF REFERENCES 134 

INITIAL DISTRIBUTION LIST 137 






6 



LIST OF TABLES 



I. PRECENT OF TOTAL MANHOURS ALLOCATED TO SPECIFIC 



ACTIVITIES BY CONTRACT 31 

II. F-102 PERCENT DETAIL HOURS COMPLETED 36 

III. F-102 PERCENT ASSEMBLY HOURS COMPLETED 37 

IV. LINEAR REGRESSION 1 RESULTS 60 

V. LINEAR REGRESSION 2 RESULTS 63 

VI. LINEAR REGRESSION 3 RESULTS 66 

VII. NONLINEAR REGRESSION 1 RESULTS 67 

VIII. NONLINEAR REGRESSION 2 RESULTS 68 

IX. PREDICTION ACCURACY MEASURES — ENTIRE HOLDOUT 

SAMPLE 72 

X. PREDICTION ACCURACY MEASURES — PORTION OF 

HOLDOUT SAMPLE 76 



7 



LIST OF FIGURES 



1. Raw Data and LN Transformed Data 57 

2. Residual 59 

3. LN Transformed Data Adjusted for Autocorrelation . . 61 

4. Residual Plots 62 

5. Normal Probability Plot of Residuals 63 

6. Residual Plot 64 

7. Normal Probability and Density Plots 65 

8. Residual Plot 67 

9. LN Transformed Data, Autocorrelation 

Transformation, First Observation Dropped 68 

10. Residual Plot 69 

11. Normal Probability and Density Plots 70 

12. Fitted Model Results: Boger et . al . Model, 

C-141 Data 79 

13. Fitted Lot Costs Results: Boger et . al . Model, 

Nonlinear Regression, C-141 Data 82 



8 



I . 



INTRODUCTION 



A. BACKGROUND 

In March of 1972, the General Accounting Office sent a 
preliminary report to Congress dealing with the acquisition 
of major weapon systems [Ref. l:p. 1] . The GAO reported 
that the Navy had experienced a cost growth of $19 billion 
on twenty-four weapon systems in FY 1971, of which 15 
percent was attributed to poor cost estimation. Inaccurate 
cost estimates for weapon systems can result in program 
delays, cost overruns, acquisition of systems that are not 
the most cost effective, and a lack of taxpayer confidence 
in military leaders, to name only a few of the consequences . 
Congressional concern and a continuing need for better 
planning estimates have made it imperative that new 
techniques be developed and old methods be improved to 
obtain better cost estimates for major weapon system 
production and acquisition [Ref. 2:p. 1]. In the area of 

cost estimation, an old technique that continues to be a 
significant tool is the learning curve. 

The first study addressing the learning curve phenomenon 
was documented by the pioneer of the learning curve, T. P. 
Wright of the Cur ti ss-Wr ight Corporation, in his 1936 paper, 
"Factors Affecting the Cost of Airplanes" [Ref. 3:p. 32]. 

Analysis of the data collected for a number of years 



9 



beginning in 1922 concerned the relationship of production 
quantity with cost as measured in direct labor hours. 
Wright claimed that each time the cumulative production 
quantity doubled, the average unit cost for that quantity 
decreased by a constant amount, and that this relationship 
plotted as a straight line on logarithmic paper. Wright's 
formulation of the learning curve was: 



Y 



c 



aX 



b 



where 



X: cumulative production quantity 

Y c : average cost per unit 

b: factor of cost variation 

a: direct manhour cost for production unit number one 

Based on most of the literature available, it can safely 
be said that the principal factors contributing to the 
existence of this learning phenomenon include considerably 
more than just operator learning. Conway and Schultz 
[Ref. 4:p. 42] believe that learning in aircraft production 

is influenced by a number of dur ing-production factors 
including: 



1 ) 

2 ) 



3 ) 



incentive pay 
changes in tooling 
design changes 



10 



4) management learning 

5) volume changes 

6) quality improvements 



The rate of a learning curve is usually described by the 
complement of the reduction achieved when the production 
quantity is doubled. This value is usually called the slope 
of the curve and is found: 



s = y 2x /y x 



= (2X) b /X b 



= 2 



where 






b: 


slope of 


learning curve 


S: 


fraction 


to which the co 




quantity 


doubles 



Wright believed that the cumulative average learning 
phenomenon plotted linearly on logarithmic scales and the 
unit learning curve formulation derived from this cumulative 
equation would be [Ref. 5:p. 266]: 



Y c - ax 



V T = Y c • x 



= ax 



b+1 



11 



So 



Y x = a (X 



b+1 



(x - 1) 



b+1 



a (b + 1)X 



b 



as X -*■ ® 



where 



Y : average cost per unit 

c 

Y t : total cumulative cost 

Yy . : cost of the Xth unit 

a ,b : parameters of the formulation 

J. R. Crawford, another major contributor to the 
literature and theory of learning curves, disagreed with 
T. P. Wright in the log-linear formulation of the cumulative 
average learning curve [Ref. 6:p. 21]. His disagreement was 
based on the apparently steep slope between early production 
units of the unit learning curve derived from the cumulative 
curve. In Crawford's studies, he described the learning 
phenomenon in what has been termed the unit learning curve: 



Y 



aX 



b 



X 



where 



Y^: cost of the Xth unit 



X: cumulative amount of units produced 



a 



manhour cost for the first production unit 



b: factor of cost variation 



12 



The cumulative average cost curve derived from the unit 
curve is [Ref. 6:p. 21]: 

Y x = aX b 

n 

Y t = a V X b 

X = 1 
X 

Y c = (a S xb >/ X 
X=1 

= (a/ (1 + b) )X b as X - « 

where 

: cost of the Xth unit 

Y t : total cumulative cost 

Y c : average cost per unit produced 

a,b: formulation parameters 

For years both the unit learning curve and the 
cumulative average learning curve have been used almost 
interchangeably. Womer and Patterson [Ref. 5:p. 266] show 

and conclude this is so because for large values of X, each 
curve is a good approximation for the other. They go on to 
say that a problem arises, however, since learning curves 
are generally formulated on the first few units of output to 



13 



forecast the cost of an entire production. Even though 
forecasts may be for large values of X, the data used to 
make them are not. Under these circumstances, the estimated 
cumulative average learning curve, for example, may approach 
a unit learning curve, but not necessarily the same unit 
curve that would be approximated from early units. Which 
log-linear learning curve specification to choose, unit or 
cumulative, had, through the years, presented a source for 
inaccurate cost estimation. Although 93 percent of all 
firms utilize Crawford's unit learning curve [Ref. 7:p. 23] , 
there are sufficient exceptions to the use of this unit 
curve implying experience seems to be the best method for 
choosing a particular model. 

Following World War II, Gardner Carr of the McDonnell 
Aircraft Corporation felt learning curves being represented 
as linear on logarithmic paper was an inaccurate portrayal 
of the learning phenomenom. In his April 1946 article 
[Ref. 8:p. 77], Carr felt that the straight line was 

adequate for overall project statistics but is rarely 
correct for budget or actual cost finding purposes. He 
believed that the cumulative average learning curve was 
S-shaped on the logarithimic scale. Explanations for the 
various segment shapes of this curve are found in a RAND 
report by Asher, "Cost Quantity Relationships in the 
Airframe Industry" [Ref. 6:p. 28]. 



14 



Another study which suggested that learning curves do 
not adhere to log-linearity was conducted by the Stanford 



Research Institute following World War II. The Stanford 
system utilizes the 'B-factor' which, basically, modifies 



Y: cost per unit in manhours 

a: theoretical first unit cost 

X: cumulative quantity produced 

B: modification factor 

The effect of this formulation is a concave curve on the 
logarithmic scale. The cost of the first unit is depressed 
and the curve arcs to the standard learning curve [Ref. 7: 

p. 8] . 

Further research that deviated from the log- 1 i near i ty 
hypothesis was conducted. Another perspective of the 
production process is that various departments contribute to 
the overall quantity of direct labor hours. Generally 
speaking, these departments are fabrication, subassembly, 
major, and final assembly. It seems obvious that each 
department contributing to the learning curve would itself 
have its own learning curve. In order for the various 



the standard learning curve for prior experience. The 



formulation of this learning curve is: 




where 



15 



departments to have their learning effects sum to an overall 

production process log-linear learning curve, each of the 

department slopes must be identical. In practice, the 

various departments often have different slopes. Summing 

these curves would result in a departure from log-linearity 

and arrive at a convex curve whose slope is bounded by the 

flattest of the component curves. In "Cost Quantity 

Relationships in the Airframe Industry" [Ref. 6:p. 69], 

Asher uses this argument while conducting a significant 

analysis disputing the log-linear hypothesis of the 

formulation of the learning curve. In his report, he also 

cites research done previously by P. B. Crouse, G. M. 

Giannini, and P. Guibert supporting his contentions. Asher 

concludes, however, that his study 

. . . does not discredit the use of the linear progress 
curve .... The linear curve is useful for making 
extrapolations beyond the data range provided the number 
of additional units is small. It is clearly a matter of 
judgement whether or not in a specific instance the linear 
curve is appropriate .... If allowable error is 
relatively small, a convex curve resulting from predicting 
each of the component curves separately is probably more 
appropr i ate . 

Another approach to research in the theory of learning 
curves has involved the inclusion of production rate as an 
explanatory variable in learning curve models. In Alchian's 
1963 article [Ref. 9:p. 679] , he cites work done in 1948 
that concluded production rate is not a relevant variable. 
Whereas as results published by Smith [Ref. 10:p. 138], and 



16 



supported by Kinton and Congelton [Ref. ll:p. 92], concluded 
that production rate plays a significant role in explaining 
the effects of learning, other studies with contradictory 
results exist. Womer and Gulledge have produced a consider- 
able literature discussing the effects of production rate 
which resulted in a final report for the Air Force [Ref. 12: 
p. 5] addressing the contradictory results of previous 
research, and they develop a cost function including 
production rate and the cos t -qu an t i t y relationship of 
learning curve theory. 

In his article "The Learning Curve: Historical Review 

and Comprehensive Study" [Ref. 13:p. 302], Yelle states that 
most of the literature in learning curve theory, from its 
inception through the 1960's, has focused on primarily 
military applications in the early years through World War 
II and on industry and business in the more recent years.. 
Through the years and various paths that research in this 
area has followed, most of the studies do not reach 
consistent conclusions. The early goals of developing a 
general formulation of the learning curve that could be 
applied to the entire aircraft industry or subsets of it 
were quickly abandoned. Despite the vast amounts of 
literature disputing the log-linear relationship between 
cost and cumulative quantity produced, the unit learning 
curve is still the most widely used formulation of the 
learning curve used in cost estimation today [Ref. 7:p. 7] . 



17 



B. OBJECTIVES 



The preceding pages and references provide a brief 
summary of the research expended on the theory of the 
learning curve over the past half century. The important 
point is the learning phenomenon and the numerous formu- 
lations of this theory in aircraft and other industries has 
been an area of extensive research and continues to be a 
viable tool in the world of production economics. 

The purpose of this research is to conduct an empirical 
study of still another theoretical reformulation of the 
learning curve. In "Budgets, Contracts, Incentives and 
Costs: A Stylized Nexus", by Boger, Jones and Sontheimer 

[Ref. 14:p. 23], the cumulative average learning curve is 

reformulated to examine the influence cost forecasting and 
budget formation have on the incentives bearing on the firm 
for cost control. The model developed by Boger et . al . , a 
cumulative average learning curve model, and a unit learning 
curve model will be estimated through simple linear and non- 
linear regression techniques using several sets of aircraft 
production data. For each formulation of the learning 
curve, the models resulting from the two fitting techniques 
will be analyzed, validated, and compared. Finally, the 
Boger et . al . model will be compared with the classical 
learning curve models for empirical validation. 



18 



THE MODELS 



II . 

A. CUMULATIVE AVERAGE LEARNING CURVE 

The cumulative average learning curve, as discussed 
above, was first formulated by T. P. Wright in the 1930's. 
The log-linear relationship between cumulative production 
quantity and average cost per unit is: 



Y 



c 



aX 



b 



where 

X: cumulative production quantity 

Y c : average cost per unit 

b: factor of cost variation 

a: direct manhour cost for first unit 

The cumulative production quantity is usually expressed 
as an integer number of units produced. The cost variable 
is measured in direct manhours expended in the production of 
the cumulative quantity produced. We expect the learning 
curve slope, factor of cost variation, to have a negative 
value when we anticipate the presence of learning in the 
production of some product. This formulation also 
presupposes a relatively constant rate of production and 
uniformity of units produced. Deviations from these last 



19 



assumptions are recognizable in a plot of the raw data, 
i.e., toe up, toe down, bottom out, scallop. 



B. UNIT LEARNING CURVE 

The unit learning curve, as also discussed above, was 
first formulated by J. R. Crawford. He disagreed with 
Wright's log-linear formulation of the cumulative average 
learning curve. Crawford believed the relationship between 
cumulative quantity produced and the cost of the final unit 
of that quantity was log-linear and was formulated as: 



Y 



X 



aX 



b 



where 

Y^: cost of the final unit 

X: cumulative quantity produced 

a: direct manhour cost for first unit 

b: factor of cost variation 

The same comments and assumptions concerning the cumulative 
average learning curve apply. 

C. BOGER, JONES, AND SONTHEIMER MODEL 

Boger , Jones, and Sontheimer express the costs of 
production over a time period as opposed to over the 
production of cumulative units regardless of time. They use 
the cumulative average learning curve as the starting point 
in their formulation. 



20 



As discussed above, the typical cumulative average 
learning curve is of the form: 

Y(t) = aQ (t) b (1) 



where now 

Y(t): average cost per unit 

Q(t): cumulative quantity of units produced through 

time t 

a ,b : learning curve parameters 

The typical progress function (learning curve) treats the 
inputs as varying continuously and causing a related 
continuous variation in some product (output) [Ref. 14: 
p. 23], From (1) we can derive an expression for total 
cost : 

Q ( t) • Y ( t) = aQ (t) b Q ( t) 

X ( t) = aQ (t) b+1 (2) 

where 

X(t): total quantity of inputs consumed by the production 

of Q ( t) 

This specification yields the following marginal require- 
ments, dX, for an incremented output, dQ: 

^ = a (b + 1 ) Q b (3) 



21 



Now, assume the product emerges in quantities at discrete 
time intervals. That is, we now develop an algorithm using 
the cumulative average learning curve formulation based on 
how many units are produced in a specified time period. In 
application, we assume that progress or cost per quantity is 
proportional to productivity achieved in prior production: 



X 



t 



X 

q 



t-i 

t-i 



q t 



where 



(4) 



q t = dQ: 
X t = dX: 




We assume 
pr eced ing 
period we 



amount produced in time period t 
inputs used in time period t 
proportionality constant 

that learning is derived not only from 
period but from all the production prior to 
are in. So we first set: 



the 

the 



X t _ dX 

q t d< 2 



a (b + 1 ) Q b 



where 

Q = Q(t) 

Substituting (4) we get: 



6 t q t /q t = a(b + 1 ) Q b 

q t-l 



x 

« t 77 ^ q t = a(b + 1 ) Q b q. 
q t-l 



(5) 



22 



We now let Q, the quantity of units produced up to time t, 
be equal to the quantity of units produced through time 
period t-1. Now, substituting into (5): 



s t 5TT q t ■ a < b * 11 i E <3j lb <J t <6) 

j-1 

Equation (6) assumes learning in period t is derived only 
from production in period t-1. We assume this relationship 
must hold at previous time periods also. So rewriting (4) 
and (5) for period t-1. 



X. . = i . - 
t-1 t-1 q 



t-2 



t-2 



q t _l = a(b + 1) Q* q t . 1 



where 

Q* : amount of units produced through time period t-2 

Therefore , 



t-2 

X t-1 = a(b + 11 1 £ <3j lb ^t-l 

j-i 

which leads to: 



X t-1 

= a (b + 1) 

q t-l 






j-1 



b 



23 



and substituting into (6): 



t-2 t-1 

, b > r ,b 



1 1 a ( b + 1)[ J^qj] q t = a(b + 1)[ ^ q j ] b q t 

j=l j=l 



t-1 

E 

1=1 



t-2 



E o 



3 



for t = 3, 4, 5, .../ T 



L j = l 

Now substituting (7) into (4) we have: 



_ t-1 



X, 



E 


q j 


x t-l 


j = l 


t-2 

E 


q j 


q t-l 


j=i 







X 



t 

<*t 



t-1 

E 



j = l 



t-2 



E <». 



j = l 



x 



t-i 



't-l 



Since this is true for all time periods, we can say; 



(7) 



24 



t-2 



't-1 



E q< 

43 - 

E ^ 

j=l 



‘t-2 

! t-2 



't-2 

*t-2 



t-3 

E q- 

j = l 

t-4 

E q- 

j = l 



't-3 

*t-3 



and so on. 



So, substituting recursively we have 



t-1 t-2 t-3 

[ £ q j ]b [ X q ji b i E q ji b 

: 1=1 1=1 j=l 



^t 



t-2 



t-3 



t-4 



[ X q j ]b [ X q j ]b C X q j ] 



j = l 



j = l 



j = l 



t x< 

j=l 



_2 

q 2 



X. 



t-1 

E « 3 

1=1 



q l 



_1 
q 2 



_ t-1 b 



= Z 



E q- 

3 = 1 



25 



where 



direct manhours per quantity produced in second 
time period 



z : 



t-1 




total quantity of units produced prior to present 
time period 



q^ : quantity of units produced on time period one 

b: factor of cost variation 

X t /q fc : average cost in direct manhours of units produced 

in time period t 

The length of the time period, although it must remain fixed 
over the data space, can be any length, i.e., day, month, or 
quarter. The quantity produced in a particular time period 
need not be an integer amount although partial units 
produced are generally not found in aircraft production 
data. As in the cumulative average and unit learning curve 
formulations, we expect the factor of cost variation to have 
a negative value. This model also presupposes uniformity 
between production units and also a constant production 
r ate . 



26 



Ill . 



DATA 



A. GENERAL 

The dependent variable in each of the models investi- 
gated will involve a cost of some type. In each of our 
models this cost will be measured as a function of direct 
manhours expended in the production of some quantity of 
units. Direct manhours will be defined as those hours spent 
on fabrication, assembly, production flight, and other 
production work associated with the basic aircraft. All 
manhours pertaining to tooling, engineering, planning, 
testing and subcontracting are not included in this 
definition. It seems obvious that the way in which direct 
manhours are accumulated can, and does, lead to inconsis- 
tencies due to differences in accounting systems from 
contractor to contractor. The use of direct manhours has 
numerous advantages over the use of dollars as a measure of 
cost. In using direct manhours, we avoid the additional 
data computations involved in applying price indices to 
transform all dollar costs into constant dollars. We also 
avoid inaccuracies in the data caused by using price indices 
which are inexact figures. Finally, direct manhours is a 
variable comparable over a group of contractors whereas, due 
to differences in wage rates from contractor to contractor. 



27 



costs measured in dollars are not the best tool for 



compar i son . 

The data for this report include aircraft production 
data for the C-141 and F-102. The C-141 was produced by the 
Lockheed Corporation and the F-102 was produced by General 
Dynamics. The C-141 program produced 284 aircraft from July 
1962 through April 1968. The C-141 is a large, swept wing, 
4 jet engine cargo transport. The data for this study were 
drawn from Orsini [Ref. 15:p. 104]. Orsini obtained the 

data from C-141 Financial Management Reports prepared by the 
contractor, Lockheed Aircraft Corporation, for the Air 
Force. The C-141 data provided a large sample of data for 
which a basic model of the aircraft was produced throughout 
the production program. Uniformity between units produced 
is a basic assumption in the application of the learning 



curve 


theory . 


Orsini 


aggregated the monthly production 


data 


into 


quarterly 


direct 


manhour production data 


r educ ing 


the 


total 


number c 


>f data 


points to twenty-four. 


Orsini 


felt 


this 


quanti ty 


was sufficient for his analy 


sis and 


the 


current resea 


irch is 


similarly restricted. 


The 


data 


variables used 


by Orsi 


ni and this researcher ar< 


a * 




1) 


direct labor hours per lot per month 






2) 


aircraft 


per lot 








3) 


del ivery 


dates o 


f each aircraft 







The F-102 program produced 1000 aircraft from 1953 



through 1958. The F-102 is a single seat, supersonic, delta 



28 



wing, all-weather fighter. The data for this study was 
drawn from Gulledge and Womer [Ref. 12:p. 73]. A 

comprehensive cost breakdown by individual airframe was 
provided by the F-102 Program Cost History" document-- the 

source of the Womer and Gulledge data. The F-102 program 
consisted of the production of F-102 airframes and TF-102 
airframes. Rather than delete the TF-102 observations for 

the sake of strict uniformity, these data points were not 

eliminated since it was assumed that learning was 
experienced in the production of these airframes. As Womer 
and Gulledge note, the total manhours expended per airframe 
can be disaggregated into three parts: details, assemblies, 

and ou t s id e-o f - f ac to r y labor. Total direct cost per 
airframe is comprised of only detail and assembly hours. 

The detail hours are comprised of fabrication hours and 
assembly hours include subassembly, major assembly, primary 
assembly, and final assembly hours. After the portion of 
labor hours expended per airframe outside the factory is 
deleted, the total direct cost per airframe is left. 

B. REFINEMENT 

As already discussed, three models will be utilized in 
the examination of two sets of aircraft production data. 
Parameter estimation for these models require the data to be 
in a particular form for each model. The C-141 production 
data is available for aircraft grouped into production lots 



29 



and the F-102 production data is available for each 
airframe. Since the models do not each fit the particular 
form of each data set, adjustments and refinements need to 
be made to the data to fit the different learning curve 
f ormul at i ons . 

1 . Cumulative Average Learning Curve 

The data requirements for the cumulative average 
learning curve are rather straightforward. The independent 
variable is the cumulative quantity of aircraft produced. 
The dependent variable is the average amount of direct labor 
hours expended per unit in the production of the cumulative 
quantity produced. The F-102 and C-141 adjusted data used 
to fit the cumulative average learning curve are tabulated 
in Appendix A. 

The composition of the F-102 data consist basically 
of total hours expended in the production of each airframe. 
This data set lends itself to be easily refined to meet the 
data requirements of the cumulative average learning. As 
previously discussed, the F-102 total direct manhours per 
aircraft consisted of three parts: details, assemblies, and 
outside of factory labor. Table I, extracted from Womer and 
Gulledge [Ref. 12:p. 86], provided the information necessary 
to translate the raw data into direct manhours per airframe. 
Since this table only applied to lots four through eleven, 
only these 204 observations were utilized. The .airframes in 
lots four through eleven were then ordered with respect to 



30 



TABLE I 

PERCENT OF TOTAL MANHOURS ALLOCATED TO 
SPECIFIC ACTIVITIES BY CONTRACT 





5942 


23903 


Fabrication 


19.45 


21.98 


Assembly 


65.82 


70.56 


Outside of 


Factory 


14.73 


7.46 



Contract 



29264 


31174 


33965 


21.23 


16.12 


18.47 


64.82 


66.27 


61.62 


13.95 


17.61 


19.91 



31 



delivery sequence number. It was this sequence--l, 2, 3, 

..., 204 — that provided the independent variable data 

vector. The sequence of cumulative sums of direct manhours 
divided by the cumulative amount of airframes delivered for 
each element of that sequence provided the dependent 
variable data vector. 

The C-141 data were organized into twelve lots. The 
number of units in each lot and the number of direct man- 
hours expended in the production of each lot of airframes is 
provided. The data required for the cumulative average 
learning curve is arrived at through a series of simple 
calculations discussed in the RAND Memorandum "An Intro- 
duction to Equipment Cost Estimating" [Ref. 16:p. 104]. The 
cumulative average hours are computed at the final unit in 
each lot--where the cumulative average hour figures apply. 
Therefore, twelve data points will be used in the parameter 
estimation for the C-141 cumulative average learning curve 
formulation . 

2 . Unit Learning Curve 

The data requirements for the unit learning curve 
are also rather straightforward. The independent variable 
is the cumulative quantity of aircraft produced. The 
dependent variable is the amount of direct manhours expended 
in the production of the final unit of the cumulative 
quantity produced. The F-102 and C-141 adjusted data used 
to fit the unit learning curve are tabulated in Appendix B. 



32 



The composition of the F-102 data again tends to be 
easily refined to meet the data requirements of the unit 
learning curve. Table I is used to translate the raw data 
of lots four through eleven into direct manhours per 
airframe. The airframes were then ordered with respect to 
delivery sequence number. It was this sequence of 204 
airframes with each unit's respective direct labor hours 
required for production that are used as the independent and 
dependent variable data vectors for the estimation of the 
parameters of the unit learning curve. 

Since the C-141 production data are grouped into 
lots, a rather gross approximating technique is required to 
transform the data into the form required by the unit 
learning curve specification. The average number of labor 
hours for each lot is treated as if it were an observation 
on the labor hours required to produce the unit at the lot 
midpoint. When dealing with a log-linear relationship, the 
arithmetic midpoint produces unequal areas under the 
learning curve between the first and last units of each 
respective lot. The exact determination of a true lot 
midpoint depends on the lot quantity, type of curve hypothe- 
sized, and the true slope of the learning curve [Ref. 16: 
p. 105]. In order to avoid the shortcomings of the 

arithmetic midpoint, the algebraic midpoint, K, discussed in 
[Ref. 17:p. 44] will be used: 



33 



K 



-1/B 



m ( 1 + B) 

(L + .5) (1 + B) - (F - .5) (1 + B) 

m: lot quantity 

B: learning curve slope 

L: last unit of the lot 

F: first unit of the lot 

An estimate of B from Womer and Patterson's report 
[Ref. 5:p. 267] , is used in calculating the algebraic 

midpoint. Again, twelve data points are used in the 

parameter estimation for the C-141 unit learning curve 
speci fications. 

3 . Boger, Jones, and Sontheimer Model 

The data requirements for this model are based on 
the statement regarding the marginal requirements for 
incremental outputs of product produced in Boger, Jones, and 
Sontheimer' s paper [Ref. 14:p. 23]. That is, the product 

emerges in lots or lumps, q fc , at discrete intervals using 
discrete inputs, , of the composite resource (direct labor 
hours) . Therefore, the data requirements for this model 
are: quantity of units produced each time period and the 

direct labor hours expended in the production of units 
produced in each time period. 

The complete data base for the F-102 program 
contains total labor hours for each airframe. This data is 
not in the form required for the Boger et. al . model. Womer 
and Gulledge took considerable care in resolving the data 





34 



problem in their study [Ref. 12:p. 85]. Their work made the 
data compatible with the theoretical model they were 
testing. The information concerning the F-102 program that 
Womer and Gulledge discuss made it possible to apply some 
further adjustments to establish a data base compatible with 
the Boger et . al . model. 

As discussed before, the ideal data for the Boger 
et . al . model is the total number of aircraft produced in a 
specific time period, q t , and the quantity of direct labor 
hours, X^_ , expended in producing q^.. Although this data is 
not directly available, Womer and Gulledge derived the next 
best alternative — cost by lot per month. Due to non- 
availability of certain information, Womer and Gulledge only 
were able to approximate the cost by lot per month for lots 
four through eleven. 

Tables I, II, and III along with the F-102 data base 
in [Ref. 12:pp. 83-85] provided enough information to adjust 
the data for lots four through eleven for use in the Boger 
et . al . model. The first adjustment was to use Table I and 
the total labor hours expended on each airframe in lots four 
through eleven to arrive at values for cumulative fabrica- 
tion and assembly hours for each airframe. As discussed 
earlier, these hours comprise the direct labor hours 
expended for each airframe. The next step was to calculate 
the equivalent airframe units produced per month for each 



35 



-102 PERCENT DETAIL HOURS COMPLETED 



Q h LD 

CN 



2 r— l I— i 

rH CO 



O CN CO 

CN 



CO ^ oc 



< CN Mn 



00 

T-1 



CO 



00 CN 
CO 



o o 

CN I— I 



IT. 

LO 

<J\ 






co in cn 

r—i CN r— I 



<o 

LO 

<?\ 






<£ O 

cn 



^ O IT* <£> 



CO 



^0 



o o 

^ CO 



2 r- o 

rH CN • 
CO 



£ 



CN LD LO 
CN 



<, oc 






< 



CN LO 00 
CN 



cr» in 



s m o 



[Li c cn 

CN 



£j o in 

f-H CN 



^ o 

CN 



>"0 o oo 

CN 






4-J-P-P-P-P-U-P4J 

oooooooo 

. — i . I _J , I 



^Mn^r^co^HH 



-P-P-P-P-P-P-P-U 

oooooooo 



36 



-102 PERCENT ASSEMBLY HOURS COMPLETED 



Q co o o 

r-H CN 



D 






2 in in in o 



O o o o 

H H CN 



in 



co cn in in 



co 



o in 

r-H 



< co o in 



<C 



no o in 

CO CO 



^ in o in 

r-H CN 



NO 

in 

o> 



o o in 

co 



co co in in 

CN r-H 



o o 
cn 



2 cn in 



r- o 

CN 



< o in 



< 



o o ^ 

i— I CO 



2 m 



lo in 
»— i co 



Du in 



txj mom 

CN CN 






>03 CN o m CN 

r-H CN 



O i— I 

N’ln^hCOO^HH 



4j4j-p4j4j.lj.p4j 

oooooooo 



O «— I 

^mNor-coor-Hr— i 



4J.P4J-P-P-P-P-P 

OOOOOOOO 



37 



lot . 



This was calculated by first determining the empirical 



production rates for each lot: 



Y 



f 



L DMH f 

aircraft 

in lot 

airframes in lot 



for lots 4, 5, 6, 



11 



Y 



a 



DMH 

a 

airframes 

in lot 

airframes in lot 




for lots 4, 5, 6, 



11 



Production rate (fab) = 1/Y^ 

Production rate (assem) = 1/Y 

a 

DMI^: direct manhours for fabrication 
DMH : direct manhours for assembly 

3 

The production rates for fabrication and assembly were then 
applied in conjunction with Tables II and III to the 
cumulative fabrication and assembly hours per month per lot, 
then added to arrive at equivalent aircraft produced per 
month per lot. These results were then summed across lots 
four through eleven for each month appropriately using 
Tables II and III to arrive at equivalent units produced per 
month. Direct labor hours expended per month on the 
equivalent quantity of airframes produced per month was 
similarly calculated. The adjusted F-102 production data 
per month for lots four through eleven for use in the Boger 
et . al . model is summarized in Appendix C. 



38 



The original form of the C-141 data made available 
to Orsini by the Air Force Plant Representative Office was 
direct manhours per lot per month expended as direct labor 
hours as defined previously and the quantity of aircraft per 
lot. Orsini then aggregated this monthly data into 
quarterly data points and tabulated it as direct manhours 
per lot per quarter. The adjustments made to the data by 
Orsini for his analysis were compatible with the refinements 
required by the Boger et . al. model. Average production 
rate for each lot was first determined by dividing total 
aircraft in each lot by the total amount of direct labor 
hours attributed to the production of each respective lot. 
This average production rate was then applied to the 
tabulated quarterly data to arrive at equivalent units 
produced per lot per quarter. The equivalent units produced 
per lot per quarter and direct labor labor hours per quarter 
were then summed across each lot for the quarters each lot 
was worked on to arrive at equivalent units produced per 
quarter and direct labor hours expended per quarter. The 
data, as refined by Orsini, used in the Boger et . al . model 
is tabulated in Appendix C. 



39 



IV. METHODOLOGY 



A. LINEAR REGRESSION 

Historically, it has usually been assumed that the 
relationship between the independent and dependent variables 
of a learning curve specification is log-linear. This 
assumption has made it particularly easy to estimate the 
learning curve parameters through simple linear regression 
when only one independent variable is used. In this study, 
the least squares, normal error regression, model is 
utilized. The normal error model is: 

Y i = 0 Q + e l x i + G i for i = 1/ 2, 3, ... 

where 

1 1*1 

Y^ : observed response of the l trial 

t h 

X i : the level of the independent variable in i u trial 

00 , 8 ^: regression parameters 

2 

e ^ : residuals which are distributed N(0, a ) 

Normality of the error terms seems reasonable since the 
residuals probably represent the accumulation of many 
effects that are omitted from the model. The cumulative 
error term, e^, would tend to comply with the central limit 
theorem and approach normality. Since the error terms are 



40 



assumed to be normally distributed, the assumption of no 
correlation between residuals becomes one of independence. 
Still yet, the assumption of normality allows one to perform 
some parametric statistical tests in evaluating the 
statistical significance of the estimated parameters and the 
aptness of the model . 

B. NON-LINEAR REGRESSION 

Non-linear regression software in STATGRAPHICS [Ref. 18: 
pp . 19-35] is used as an alternative method of parameter 

estimation. In this procedure, least squares estimates of 
the parameters of a non-linear model are determined. The 

learning curve formulations in this study are inherently 
non-linear when the data are in their raw form. The non- 
linear model is: 

Y^ = aX^ + for i = 1, 2, 3, ... 

where 

Y^: observed response of the l trial 

t h 

X^: level of the independent variable of i trial 

a,b: regression parameters 

2 

residuals which are distributed N(0, o ) 

The non-linear regression method utilized in the 
STATGRAPHICS software was developed by D. W. Marquardt and 
represents a compromise between the linearization (Taylor 
series) method and the steepest descent method of non-linear 



41 



parameter estimation. Marquardt's compromise has been 
described as combining the best features of the lineariza- 
tion and steepest descent methods while avoiding their most 
serious limitations. A detailed discussion and references 
for this algorithm are contained in Draper and Smith's 
Applied Regression Analysis , Second Edition [Ref. 19: 
p. 471]. An important aspect of non-linear regression that 
deviates from the linear case is worth mentioning. When the 
error term of the non-linear model is assumed to be normally 
distributed, the parameter estimates are no longer normally 
distributed and the sample residual variance is no longer an 
unbiased estimate of the residual variance. While suitable 
comparison of mean squares can be made visually, the usual 
F-tests for regression and lack of fit are not valid, in 
general, for the non-linear case [Ref. 19:p. 484]. 

C. DATA ANALYSIS 

Examination of the observed residuals of a regression 
model is an important aspect of any regression technique. 
If the model is appropriate, the observed residuals should 
reflect the properties assumed for the error term in the 
regression model. In this study, both graphical and 
statistical tests involving the residuals will be performed. 
Evaluation of the residuals of the various models to be 
considered will address possible departures from the model 

ftr 

including: the regression model does not hold, the error 



42 



terms do not have constant variance, the error terms are not 
independent, the model fits all but one or a few outliers, 
and the error terms are not normally distributed. 

After fitting a model to the data, residuals falling 
into a horizontal band centered at zero displaying no 
systematic tendencies to be positive or negative and 
appearing to be randomly scattered would suggest the 
assumptions of the model do not appear to be violated. This 
would imply the model is well suited to the data. If this 
is not the case, remedial measures would need to be taken. 
Generally speaking, there are two types of remedial measures 
that are normally followed: abandon the model altogether or 

use some transformation on the data so the model is appro- 
priate for the transformed data. In this report, only two 
aspects of data transformation will be reckoned with: 
autocorrelation and the handling of outliers. When these 
two problems are dealt with and further residual analysis 
clearly implies the assumptions of the model are not met, 
the model will be rejected. 

1 . Autocorrelation 

The regression models of ordinary least squares or 
maximum likelihood techniques consider the stochastic 
disturbance terms, the residuals of the regression, to be 
either uncorrelated or independent normal random variables. 
In the application of regression models to learning curves. 



43 



we use time series data. The assumption of no correlation 
or independence between error terms for time series data is 
often inappropriate. The observed correlation between 
residuals of regression modeling is called autocorrelation 
or serial correlation. 

Neter and Wasserman outline the problems associated 
with autocorrelation: 

i) The regular least squares regression coefficients are 
still unbiased but no longer have the minimum 
variance property and may be quite inefficient. 

ii) The mean squared error (MSE) may seriously 
underestimate the variance of the error terms. 

iii) The estimated standard deviation of the regressio^ 
coefficients may be seriously underestimated and R 
may be overestimated. 

iv) The confidence intervals and tests using the 
student's t and F distributions are no longer 
strictly applicable. [Ref. 20:p. 352] 

In this study, the existence of first order auto- 
correlation, AR [1], will be investigated graphically and 
will be statistically tested using the Durb i n-Watson test. 
If autocorrelation indeed exists after examination of the 
residuals, this information will be used' to improve the 
regression model. The autocorrelation will be modeled and 
accounted for in a transformation of the model data. 

The first-order autocorrelation error model 
discussed by Neter and Wasserman [Ref. 20:p. 353] for a 

simple linear regression is: 



44 



